Detecting Japanese idioms with a linguistically rich dictionary

نویسندگان

  • Chikara Hashimoto
  • Satoshi Sato
  • Takehito Utsuro
چکیده

Detecting idioms in a sentence is important to sentence understanding. This paper discusses the linguistic knowledge for idiom detection. The challenges are that idioms can be ambiguous between literal and idiomatic meanings, and that they can be “transformed” when expressed in a sentence. However, there has been little research on Japanese idiom detection with its ambiguity and transformations taken into account. We propose a set of linguistic knowledge for idiom detection that is implemented in an idiom dictionary. We evaluated the linguistic knowledge by measuring the performance of an idiom detector that exploits the dictionary. As a result, more than 90% of the idioms are detected with 90% accuracy.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

نقد و بررسی: فرهنگ اصطلاحات و عبارات رایج فارسی (فارسی – انگلیسی)

فرهنگ اصطلاحات و عبارات رایج فارسی (فارسی – انگلیسی)، تألیف استاد دکتر محمدرضا باطنی با دستیاری زهرا احمدی‌نیا در سال 1392 توسط انتشارات فرهنگ معاصر در 1089 صفحه به چاپ رسیده است. در زبان انگلیسی فرهنگ‌های متعددی وجود دارند که مدخل‌های آنها به شرح اصطلاحات و ترکیبات زبان اختصاص دارد. فرهنگ اصطلاحات کالینز کوبیلد[1]، فرهنگ اصطلاحات کمبریج[2] و فرهنگ اصطلاحات امریکن هریتیج[3]نمونه‌هایی از این‌گون...

متن کامل

NTT DATA at TREC-7: System Approach for Ad-Hoc and Filtering

In TREC-7, we participated in the ad-hoc task (main task) and the ltering track (sub task). In the adhoc task, we adopted a scoring method that used co-occurrence term relations in a document and speci c processing in order to determine which conceptual parts of the documents should be targeted for query expansion. In ltering, we adopted a machine-readable dictionary for detecting idioms and an...

متن کامل

Making an XML-based Japanese-Slovene Learners' Dictionary

In this paper we present a hypertext dictionary of Japanese lexical units for Slovene students of Japanese at the Faculty of Arts of Ljubljana University. The dictionary is planned as a long-term project in which a simple dictionary is to be gradually enlarged and enhanced, taking into account the needs of the students. Initially, the dictionary was encoded in a tabular format, in a mixture of ...

متن کامل

Idiomatic Expressions in VerbaLex

Idiomatic expressions are part of everyday language, therefore NLP applications that can “understand” idioms are desirable. The nature of idioms is somewhat heterogenous — idioms form classes differing in many aspects (e.g. syntactic structure, lexical and syntactic fixedness). Although dictionaries of idioms exist, they usually do not contain information about fixedness or frequency since they...

متن کامل

A New Dictionary Construction Method in Sparse Representation Techniques for Target Detection in Hyperspectral Imagery

Hyperspectral data in Remote Sensing which have been gathered with efficient spectral resolution (about 10 nanometer) contain a plethora of spectral bands (roughly 200 bands). Since precious information about the spectral features of target materials can be extracted from these data, they have been used exclusively in hyperspectral target detection. One of the problem associated with the detect...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Language Resources and Evaluation

دوره 40  شماره 

صفحات  -

تاریخ انتشار 2006